2 



MDCT, 1024 discrete-time samples for example always result in 
1024 spectral values. 

It is known that the receptivity of the human ear depends on 
the momentary spectrum of the audio signal itself. This de- 
pendence is reflected in the so-called psychoacoustic model. 
Using this model it has long been possible to calculate mask- 
ing thresholds in dependence on the momentary spectrum. Mask- 
ing means that a particular tone or spectral portion is ren- 
dered inaudible when e.g. a neighbouring spectral region has a 
relatively high energy. This phenomenon of masking is ex- 
ploited so as to quantize the post-transform spectral values 
as coarsely as possible. The aim, therefore, is to avoid audi- 
ble disturbances in the decoded audio signal while using as 
few bits as possible to code, or here to quantize, the audio 
signal. The disturbances introduced by quantization, i.e. the 
quantization noise, should lie below the masking threshold and 
thus be inaudible. In accordance with known methods the spec- 
tral values are therefore subdivided into so-called scale fac- 
tor bands, which should reflect the frequency groups of the 
human ear. Spectral values in a scale factor group are multi- 
plied by a scale factor so as to scale spectral values of a 
scale factor band as a whole. The scale factor bands scaled 
with the scale factor are then quantized, producing quantized 
spectral values. It is of course obvious that a grouping into 
scale factor bands is not essential. This procedure is, how- 
ever, used in the standard MPEG layer 3 and in the standard 
MPEG-2 AAC (AAC = Advanced Audio Coding) . 

A very important aspect of data reduction is the entropy cod- 
ing of the quantized spectral values resulting from quantiza- 
tion. A Huffman coding is normally used for this. A Huffman 
coding entails variable-length coding, i.e. the length of the 
code word for a value to be coded depends on the probability 
of this value occurring. As is logical the most probable sym- 



3 



bol is assigned the shortest code, i.e. the shortest code 
word, so that very good redundancy reduction can be achieved 
with Huffman coding. An example of a universally known vari- 
able-length coding is the Morse alphabet. 

In audio coding Huffman codes are used to code the quantized 
spectral values. A modern audio coder which operates e.g. ac- 
cording to the standard MPEG-2 AAC uses different Huffman code 
tables, which are assigned to the spectrum according to par- 
ticular criteria on a sectional basis, to code the quantized 
spectral values. Here 2 or 4 spectral values are always coded 
together in one code word. 

One way in which the method according to MPEG-2 AAC differs 
from the method MPEG layer 3 is that different scale factor 
bands, i.e. different spectral values, are grouped into an ar- 
bitrarily large number of spectral sections. In AAC a spectral 
section contains at least four spectral values, preferably 
more than four spectral values. The whole frequency range of 
the spectral values is thus divided up into adjacent sections, 
where one section represents a frequency band, so that all the 
sections together cover the whole frequency range which is 
spanned by the post-transform spectral values. 

To achieve a maximum redundancy reduction, a so-called Huffman 
table, one of a number of such tables, is assigned to each 
section as in the MPEG layer 3 method. In the bit stream- of 
the AAC method, which normally has 1024 spectral values, the 
Huffman code words for the spectral values are now in an as- 
cending frequency sequence. The information on the table used 
in each frequency section is transmitted in the side informa- 
tion. This situation is shown in Fig. 2. 

In the case chosen to serve as an example in Fig. 2 the bit 
stream comprises 10 Huffman code words. If one code word is 



4 



always formed from one spectral value, 10 spectral values can 
then be coded here. Usually, however, 2 or 4 spectral values 
are always coded together in a code word, so that Fig. 2 rep- 
resents a part of the coded bit stream comprising 20 or 40 
spectral values. In the case where each Huffman code word com- 
prises 2 spectral values, the code word referenced by the num- 
ber 1 represents the first two spectral values. The length of 
this code word is relatively short, meaning that the values of 
the first two spectral values, i.e. of the two lowest fre- 
quency coefficients, occur relatively often. The code word 
with the number 2, on the other hand, is relatively long, 
meaning that the contributions of the third and fourth spec- 
tral coefficients in the coded audio signal are relatively in- 
frequent, which is why they are coded with a relatively large 
number of bits. It can also be seen from Fig. 2 that the code 
words with the numbers 3, 4 and 5, which represent the spec- 
tral coefficients 5 and 6, 7 and 8, and 9 and 10, also occur 
relatively frequently, since the length of the individual code 
words is relatively short. Similar considerations apply to the 
code words with the numbers 6-10. 

As has already been mentioned, it is clear from Fig. 2 that 
the Huffman code words for the coded spectral values are ar- 
ranged in linearly ascending order in the bit stream from the 
point of view of the frequency in the case of a bit stream 
which is generated by a known coding device. 

A big disadvantage of Huffman codes in the case of error- 
afflicted channels is the error propagation. If it is assumed 
e.g. that the code word number 2 in Fig. 2 is disturbed, there 
is a not insignificant probability that the length of this er- 
roneous code word number 2 will also be changed. This thus 
differs from the correct length. If, in the example of Fig. 2, 
the length of the code word number 2 has been changed by a 
disturbance, it is no longer possible for a decoder to deter- 



